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Abstract 

We study the problem of estimating time- varying coefficients in ordinary dif- 
ferential equations. Current theory only applies to the case when the associ- 
ated state variables ar e observed without measurement errors as presented in 
Chen and Wu fl2008al lbh. The difficulty arises from the quadratic functional 



of observations that one needs to deal with instead of the linear functional 
that appears when state variables contain no measurement errors. We de- 
rive the asymptotic bias and variance for the previously proposed two-step 
estimators using quadratic regression functional theory. 

Key words: differential equation, local polynomial regression, 
measurement error, varying-coefficient models. 



1. Introduction 

Ordinary differential equations (ODEs) are widely used to describe sys 



terns i n physics, chemist ry, biology and medicine (j Gardner et all 120031 ; ICao and Zhaol . 



20081 : iMiao et all . l2009h . These ODEs usually involve quite a few unknown 



parameters that need to be estimated from observational data. Thus unlike 
traditional studies of dynamical systems that seek solutions for the equations, 
here we are concerned with the inverse problem of estimating the equations 
themselves given state variable measurements. Unfortunately, most ODE 
systems used in these applications are often complicated in form and thus do 
not entertain analytical solutions. Besides, the observations typically contain 
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measurement errors and statistical methods are required to estimate these 
parameters. 

In general, such system can be written as 

dK(t) 



dt 



F(X(t),/3(t),V,a), 



(1) 



where X(£) = (Xi(t), . . . , X p (t)) T are time-varying covariates, V are non- 
time- varying covariates, and {3(t),a are time- varying and non-time- varying 
parameters respectively. F is assumed to be known. We also assume t G [0, 1] 
without loss of generality. However, we do not observe X(i) directly. Instead 
we have noisy observations 



y, = xfe) + a, 

where Y, = (Yu, . . . , Y pi ) T are our actual observations and = (en, 



(2) 



are the mean zero measurement errors assumed to be independent and iden- 
tically distributed. 

Because of the importance of this problem, it has been investigated by 
many researchers. One approach uses classical parametric infe rences such as 



the n onlinear least square or maximum likelihood estimation (IBiegler et al. 



19861 ). In this approach, optimization usually involves an iterative process, 
and requires usin g numerical methods such as Euler or Runge-Kutta. Simi- 
larly, inferences in lGelman et al.l (119961 ) are based on Bayesian principle aided 
with Markov chain Monte Carlo methods for posterior exploration. This ap- 
proach is computationally intensive since numerical approximations to the 
solutions are required for each update of the parameters. 

Estimation of equation paramete rs that does n ot require numerical solu- 
tions has been proposed as early as IVarahl (119821 ). but seems to be largely 
ignored until recently. In this two-step approach, X and t heir d erivatives are 
first estimated using a nonparametric smoother (jVarahl (119821 ) used splines 
as the smoother), and in the second step the parameters in the ODEs are 
found based on minimizing the squared difference of the two sides of equation 
([T]) when the estimated covariates and their derivatives are plugged into the 
expression. This gene ral approach is simple t o implement and is taken up 
in so me recent works ( Chen and Wu . 2008al lbl; iLiang and Wu . 2008 ; Brunei . 
20081 ) where besides splines some of these authors used the local polynomial 
regression method. 



In another work, iRamsay et al.l (120071 ) proposed a new method called 
the generalized profiling procedure. In this approach, the ODE solution 



2 



is approximated by splines and both the coefficients of the basis functions 
and the unknown parameters in the ODEs are estimated by minimizing a 
penalized smoothing functional, which reflects a trade-off between fitting the 
data and satisfying the ODE model. 

Both approaches described above do not required numerical solutions of 
ODE and have their respective advocates. Here we take the approach of the 
former, in particular IChen and Wu (l2008al ]bh. and provide some new asymp- 
totic results for a special case of ([1]) that has not been attacked before. In 
particular, we consider the following ODE involving time- varying coefficients: 



dX x {t) 
dt 



(3 T (t)X(t), 



(3) 



where (3(t) = (Pi(t), . . . , (3 p {t)) T are time- varying coefficients and all Xa(t), 1 < 
d < p, are observed with measurement errors as in equation (121) . Extension 
to multiple ODEs is straightforward although cumbersome in notation. We 
can also incorporate non-time-varying coefficients and covariates but it is 
regarded as simpler to analyze so we do not consider these cases. 

As far as we know, the asymp totic properties for m odel ([3]) are nonexis- 
tent. For the method proposed in lRamsav et al.l (120071) an d the more recent 
asymptotic analysis for this approach (jOi and Zhaol . 120091 ). only models in- 
volv ing finite-dimensiona l parame t ers ar e discussed. For the two-step meth- 



ods, |Liang_andWu (2008); 



parameters. 



Chen and Wu 



Brunell (120081 ) also only consider non-time- varying 



(j2008al ) consider the model 



d=l 



where the functional covariates Z^it) associated with the time- varying coeffi- 
cients are observed w i thout m easurement errors and the function g is known. 
While Chen and Wu (l2008bh discussed a very general model 



dX{t) 
dt 



F(X(t),(3(t)) 



(4) 



where F is known, their theoretical analysis is again only limited to a very 
special case 

^=/3( t )-aX( t ), 
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where the time-varying coefficients are not associated with covariates con- 
taining measurement errors and the constant a is known. The avoidance of 
these authors to analyze model ([3]) already alludes to the associated difficul- 
ties, and this is what we set out to demonstrate in this paper. 



2. Asymptotic bias and variance 

Our problem is defined by equations (J2J) and (j3J), but with the extra com- 
plication that the state variables are observed in m independent experiments 
(say with different initial values) resulting in m noisy trajectories for each 
state variable. More specifically, we make observations 

Ydii = XdiiU) + tdii, 1 < d < p, 1 < I < m, 1 < % < n, 
where the state variables obey the ODEs 



d =l 



Later we will use the notations Y d i = (Yon, . . . , Y d i n ) T , e dl = (e d n } . . . , e d i n ) T 
and Xi(t) = (X u (t), . . .,X p i(t)) T . Note for simplicity we assume the obser- 
vation times are the same for all p state variables and all repeats X d i, 1 < 
d < p, 1 < I < m. Using a two-step approach, we first estimate X d i(t) and 
the first derivative of X u(t) separately using the local polynomial estimator 
(IFan and Gijbela . 120031 ) . Based on Taylor expansion, X d i{t) is approximated 
by 

Xdi(t) w a + ai(t - t ) + . . . + a q (t - t ) q , 

for observation time t close to a fixed point to- Using a kernel function K 
with a bandwidth h for localization, the local polynomial estimator can be 
obtained by minimizing the criterion 



Y^(Y dh -J2dr(t t -t ) r ) 2 K( 



r=0 



U — to . 
h ■ 



resulting in solution 



where 



T 



{^WTY^WYdu 
/ 1 (ti - t„) • • • (ti " tof \ 

\1 (t n - t ) • • • (tn - t ) 9 ) 
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and W = diag(K(^ s -), . . . ,K(^f^-)). In particular, we can estimate X dl 
and its derivative X' dl = dX^i/dt (only the derivative of Xu will be used 
though) by 

n 

XM = w ((k - t )/h)Y dli , (5) 
1=1 

and 

n 

X*(t )=^W 1 ((t i -t )/h)Y dli , (6) 
i=i 

where W u (t) = el q+1 (T T WT)-\l, ht, . . . , h q t q ) T K(t), is = 0,1 and e v , q+1 is 
the (q + 1) dimensional unit vector having 1 as the [y + l)th component, 
otherwise. 

In the second step, we substitute the estimates X d i and X' u in the differ- 
ential equation model and try to estimate the unknown coefficients (3(t) = 
(Pi(t), . . . ,(3 p (t)) T . Again one uses local polynomial regression in this step. 
Around a fixed point t Q G (0, 1) and approximating (3 d (t) by 

Pd(t) = [3 d0 + Pa(t-t ) + ... + (3 dq (t - to) 9 , 



we obtain the local polynomial estimator (3{t) by minimizing the locally 
weighted functional 





n m / p q 

EE ^)-DE^ - 

i=l l=\ \ d =l r=0 


t ) r )x dl (u) 


j K((U-t )/h). 


Let 












( Xnih) 


■■■ (*1 - * ) 9 *ll(*l) 


X 21 {h) ■ 


■ ■ {h-t^yx^) \ 


Z = 


Xu(t n ) 


••• (t n -t ) q X u (t n ) 


X2l(t n ) ■ 


(t n — to) q X p i(t n ) 




Xi m (ti) 


■ ■■ (h- uyx^h) 


X2m(tl) ■ 


(ti — to) q X pm (ti) 




\ Xi m (t n ) 


(tn — to) q Xi m (t n ) 


X2m(t n ) ■ 


■ ■ (t n — to) q X pm (t n ) J 


(of dimension mn x p(q + I )) and let Y = 
of the above can be written as 


-(xiih),... 


,X[(t n )) T , the solution 



{Z T WZ)- 1 Z T WY, 
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which contains estimates of /?d (to ) ? ^ ^ d < p together with their derivatives, 
where W = I m <8> W = diag(W, . . . , W) is the mn x mn diagonal matrix of 
local weights, ® denotes the Kronecker product and I m is the mxm identity 
matrix. Since we are only interested in Pd{to), we have the local polynomial 
estimator 

P{t ) = (I p ® el q+1 )(Z T WZ)- l Z T WY. (7) 

Note we could use different orders of polynomial and different bandwidths 
or even different kernels for the two steps, but we will avoid discussion on 
these issues since our notation is already very complicated and the results in 



Chen and Wul (l2008al ) seem to suggest that these more flexible choices will 
not affect the asymptotic order of the estimators except for multiplicative 
constants for bias and variance. 

We first state some standard assumptions that are used throughout the 
paper, which are always implicitly assumed even without mentioning. Our 
asymptotic results consider m and Xdi{-) as fixed (or, conditional on Xdi{-)) 
and let n, the number of time points, go to infinity. 

(i) The kernel K is a continuous, bounded and symmetric probability den- 
sity function, with a support on [—1,1]. 

(ii) The state variables X<n(t), 1 < d < p, 1 < / < m, as well as the time- 
varying coefficients /3d{t), 1 < d < p, are all three times differentiable 
with continuous derivatives. 

(iii) The mean zero measurement errors e^u, l<d<p,l<l<m,l<i<n 
are independent and identically distributed with finite fourth moment 
and its variance is denoted by Ee 2 = a 2 . 

(iv) The observation time points ti, 1 < % < n, are independent and identi- 
cally distributed with density function / supported on [0, 1], which is 
continuously differentiable and bounded away from zero. 

(v) The bandwidth h satisfies h —> and nh 3 — > oo. 

(vi) Local quadratic regression is used, that is, q — 2. 

We use several lemmas to simplify the presentation of our main results. 
First we have the following simple lemma concerning Z r WZ, which appears 
in (H). 

Lemma 1. Z T WZ = nhf(t )[(ET=i X i(to)^i(to) T ) ® HSH}(1 + o P (l)), 
where H = diag(l, h, ... , h q ) and S is a (q + 1) x (q + 1) matrix whose 
entry is J y t+ ^~ 2 K{y)dy. 
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Proof. Note Z T WZ can be written as 

n m 

2(X,(*i) ® Ti)K((ti - £ )//i)(X^) (8) 



T 



i=l z=i 



where Tj = (1, £ 4 — i , . . . , — to) 9 ) T - Using the law of large numbers, one 
can show Y!i=i IXi( x *(^) ® Ti) K {{U ~ to)/h)(Xi(ti) ® T;) T (i.e., if the co- 
variates are observed without error) is equal to w/i/(£o)E™i X/(to)X/(to) T ® 
-/YS7J](1 + o p (l)). The lemma easily follows from \Xdi(t) — Xdi(t)\ = op(l). 

□ 

The following property is well-known (IHuang and Fan! Il999l ; iFan and Yaol . 



20031 ) and is stated here only for completeness. 



Lemma 2. For the weights Wo, W\ defined immediately after (TjJ) and 
we have 

sup sup \nh v+1 W v {t) - K v {t)/f{t )\ = o P (l), 
te[-i,i] *oe [o,i] 

where K u {t) = e^S^l, t,..., P)K(t), u = 0,l. 

Next we deal with the p x (q + 1) dimensional vector Z r W7. First we 
can write 

Z T WY = jZiMU) ® T^(^)X^). 
i=i z=i 

A general component of this column vector is 

n m t - t 

J2 z2& ~ toYXMKi^—^^iU), 0<r< qi l<d<p. 

i=i i=i n 

Note the appearance of X d i(ti) and X' u (ti) together in each term of the sum 
is probably what deterred the researchers from studying its property. 
Using (jHJ) and (jEJ), the above displayed expression is written as 



l<i,j,fc<^ l<Z<m I 



where the (j, fc) entry of the n x n matrix A r , < r < g, is defined to be 

X> - «oW(^W^«^)- (8) 
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The following asymptotic properties of A r are most important in deriving 
our main results. 



Lemma 3. 



(ii) 
(Hi) 

(iv) 

(v) 
(vi) 



tr(A r ) 
tr{A 2 ) 
tr{A r A T r ) 

X diA r AjX u 

Xdl^ArXu 



(C + o P (l))h r - 1 
{C + opil))^- 1 
(C + o P (l))/i 2r - 1 

nh r+1 f(t )X dl (t )X' u (t ) J y r K(y)dy + (C + o P (l))nh r+3 , l<d<p 

(C + o P (l))/i 2r - 1 + (C + o P (l))nh 2 
(C + o P (l))/i 2r - 1 + (C + op(l))n/i 2 



where in the above expressions, different appearances of C denote different 
constants depending on the kernel K and time points density f . 



Proof . The results in the lemma are similar to those found in lHuang and Fan 



(119991 ) . in particular their equations (7.3), (7.6), (7.11) and (7.19). Our re- 
sults are different in that we consider dense time points t\, . . . , t n while they 
consider estimation of some integral so that integrations should be replaced 
with summ ations in our case. Bes ides, we consider product of Wq and W\ in 
([S]) while in lHuang and Fan! (119991 ) only expressions s uch as W 2 appea r . Ney - 
ertheless, the calculations involved are very similar to lHuang and Fanl (119991 ). 
if not slightly more cumbersome. We only briefly consider the calculation of 
tr(A r ) in the following. 

Using Lemma [2j we can write 



tr(A r ) 



8=1 k=l 
n n 

i=l k=l 

tk — t 



■tk 



h 



tf 



) 



Koi\^) + o P (^I^ til<h} )/(f(f l )nl>) 



h 



) + o P (l)I {ltk „ ti]<h} ) /(f(U)nh 2 ) ■ K(p-^) 



i=i 
N- 1 



U — to . 



u r K(u)du 



h 



)/(f(U)nh 2 ) / Koi^K^du-il + opil)) 



op(1)) 



K (u)Ki(u)du 



and the result on tr(A r ) is proved. One can see t hat the calculation stra tegies 
are quite similar to equations (7.2) and (7.3) in lHuang and Fan! (jl999j). □ 
Now we can state and prove the main result in this paper. 



Theorem 1. We have the following conditional bias and variance for j3(to): 
E0 d (t o ) - A,(* )|fi, • • • ,tn) = (Ci + o P (l))h 2 + (C 2 + op(1))4? 

Var0 d (t o )\t l7 ...,tn) = (C 3 + op(1))^t3 + (C 4 + o P (l))- 
for some constants G\, C 2 , C 3 and C 4 . 

Proof. As observed above, a general component of Z T ~WY can be written 
as Ez Y JA Y n = ^(XtfjA-Xu + e S^rXii + Xj;v4 r ei Z + e%A r eu), where 
X^ = (Xdi(ti), . . . ,Xdi(t n )) T denotes the unobserved states. Using these 
expansions, for d — 1, the conditional expectation of Yj^Yu is X^y4 r X i; + 
cr 2 tr(A r ) and the conditional variance is 4cr 2 X^(y4 r + Aj) 2 X.u + cr 2 tr((A r + 
Aj) 2 )/2 + (Ee A - 3a 2 ) J27=i ^ii,r> where A 2 i r are the diagonal entries of A r , 
while if d 1 the conditional expectation is X^A r Xiz and the conditional 
variance is a 2 [X^A^ A r Xu + X%A r A^Xu + tr(A r Aj)}. 

Based on Lemma [3] and the above discussion, we can write 

rn 

Z ?WY - nhf(t )[J2 X,(to)*«(*o) ® w] - a n = P (b n ), (9) 
1=1 

where w = ( J K(y)dy, h f yK(y)dy, . . . ,h q J y q K(y)dy) T is obtained from 
Lemma [3] (iv), the px (g + 1) dimensional vector a n is the bias term, and b n 
is a p x (q + 1) dimensional vector containing the standard deviation terms, 
both of which can be found from Lemma [3j The details are omitted here to 
avoid messy notations. 

Finally, incorporating Z T WZ, we note 
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in 



(I P ® ^ 2 T W7 - n^/(t )[(X; x i(*o)^,(to)) ® w] 

0(t o ) - (/, ® e^ +1 )^- T [(^X i (to)XKto) T )- 1 ® (tfStf)- 1 ] x 



n/i/(t )E X K*o)^u(*o) ® w] -a„ 
z=i 

= /3(t ) - P(t ) - O p ((I p ® ej g+1 )a n /n/i) 

The asymptotic bias and variance is thus derived from (Q. □ 

Remark 1. After finding the conditional asymptotic bias and variance, it is 
possible, under suitable c onditions, to pro v e asy mptotic normality of (3(t ), 
following the strategies in \Huang and Fan 

Remark 2. The bias and variance calculated depends on our assumptions 
that Xdi is three times differentiable and local quadratic regression is used. It 
is possible to extend the results and get other rates when we make different 
assumptions on the order of smoothness of X^i and use local polynomial with 
different orders. 



3. Conclusion 

In this paper we investigated some asymptotic properties of the two- 
step estimation in ODE where the time-varying coefficients are associated 
with noisy state variables. Asymptotic bias and variance for the estimator 
are found. The results presented here complement the existing results in 
differential equation models and make the theory more complete. The open 
questions include data-driven selection of the bandwidth which has not been 
investigated in this case and confidence interval construction. Finally, we 
think some extensions are possible. For example, one can use a known link 
function other than the identity and consider asymptotic theory for (jJJ). 
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